Multi-level Disambiguation Grammar Inferred from English Corpus, Treebank, and Dictionary

نویسندگان

Eric Atwell

Simon Arnfield

George Demetriou

Steve Hanlon

John Hughes

Uwe Jost

Rob Pocock

Clive Souter

Joerg Ueberla

چکیده

In this paper we will show that Grammatical Inference is applicable to Natural Language Processing. Given the wide and complex range of structures appearing in an unrestricted Natural Language like English, full Grammatical Inference, yielding a comprehensive syntactic and semantic definition of English, is too much to hope for at present. Instead, we focus on techniques for dealing with ambiguity resolution by probabilistic ranking; this does not require a full formal Chomskyan grammar. We giv e a short overview of the different levels and methods being investigated at CCALAS for probabilistic ranking of candidates in ambiguous English input.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disambiguating Compound Nouns for a Dynamic HPSG Treebank of Wall Street Journal Texts

The aim of this paper is twofold. We focus, on the one hand, on the task of dynamically annotating English compound nouns, and on the other hand we propose disambiguation methods and techniques which facilitate the annotation task. Both the aforementioned are part of a larger on-going effort which aims to create HPSG annotation for the texts from the Wall Street Journal (henceforward WSJ) secti...

متن کامل

Sejong Korean Corpora in the Making

The 21st Century Sejong Project is a comprehensive project aiming to build various kinds of language resources including Korean corpora, comparable to BNC (Aston & Burnard, 1998), and Korean electronic dictionaries. The project was conceived of in 1997 and started in 1998 as a 10-year long-term project. By 2003, we completed 6 years of our work. The Sejong Corpora are a collection of raw corpor...

متن کامل

Towards an LFG parser for Polish: An exercise in parasitic grammar development

While it is possible to build a formal grammar manually from scratch or, going to another extreme, to derive it automatically from a treebank, the development of the LFG grammar of Polish presented in this paper is different from both of these methods as it relies on extensive reuse of existing language resources for Polish. LFG grammars minimally provide two levels of representation: constitue...

متن کامل

A new semantically annotated corpus with syntactic-semantic and cross-lingual senses

In this article, we describe a new sense-tagged corpus for Word Sense Disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in the english version of this instance in a parallel corpus, (2) an entry of the verb in a computational dictionary of French (the Lexicon-Gramm...

متن کامل

The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking

Automatic syntactic analysis of a corpus requires detailed lexical and morphological information that cannot always be harvested from traditional dictionaries. In building the INESS Norwegian treebank, it is often the case that necessary lexical information is missing in the morphology or lexicon. The approach used to build the treebank is incremental parsebanking; a corpus is parsed with an ex...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Multi-level Disambiguation Grammar Inferred from English Corpus, Treebank, and Dictionary

نویسندگان

چکیده

منابع مشابه

Disambiguating Compound Nouns for a Dynamic HPSG Treebank of Wall Street Journal Texts

Sejong Korean Corpora in the Making

Towards an LFG parser for Polish: An exercise in parasitic grammar development

A new semantically annotated corpus with syntactic-semantic and cross-lingual senses

The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking

عنوان ژورنال:

اشتراک گذاری